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ABSTRACT 

An  approach  to  sequential  design  for  estimating  the  root  of  a  nonlinear 
equation  is  described.  It  sets  the  next  design  point  at  the  current  estimate 
of  the  parameter  via  a  parametric  model  and  maximum  likelihood  (or  other 
efficient)  estimation.  For  normal,  binomial  and  Poisson  errors  and  their 
respective  canonical  link  functions,  it  is  close  to  the  Robbins-Monro 
stochastic  approximation  and  thus  enjoys  the  latter's  robustness  against  the 
misspecification  of  the  link  function.  Some  new  variations  of  the  Robbins- 
Monro  scheme  are  obtained  as  a  consequence. 
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SIGNIFICANCE  AND  EXPLANATION 


J  When  the  saving  of  sample  size  is  an  important  consideration,  sequential 
design  of  experiments  is  often  used.  By  efficiently  utilizing  the  information 
in  the  past  experiments,  it  determines  how  the  next  experiment  should  be 
conducted.  Statistical  theory  for  sequential  designs  has  been  developed  for 
normal  and  binomial  variations.  For  the  problem  of  determining  the  solution 
of  an  unknown  nonlinear  equation,  we  have  developed  a  class  of  sequential 
design  procedures  that  can  handle  very  general  variations  described  by  the 
generalized  linear  models.  In  special  cases  it  includes  a  new  adaptive 
version  of  the  Bobbin s-Monro  stochastic  approximation  and  a  maximum  likelihood 
recursion  scheme  for  quantal  responses.  Its  relation  to  the  stochastic 
approximation  and  the  role  the  link  function  plays  are  studied.  Theoretical 
issues  such  as  consistency,  robustness,  asymptotic  normality  and  second-order 
properties  are  discussed. 
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MAXIMUM  LIKELIHOOD  RECURSION  AND  STOCHASTIC 
APPROXIMATION  IN  SEQUENTIAL  DESIGNS* 

C.  F.  J.  Wu 


t.  INTRODUCTION 

He  are  interested  in  efficient  sequential  designs  for  estimating  the  root 
of  an  unknown  nonlinear  equation,  where  the  distribution  :  the  responses  is 
quite  general  (continuous  or  discrete).  The  proposed  -oach  is  based  on 
design  updating  with  the  maximum  likelihood  estimate  vi  parametric  model. 

It  is  dubbed  the  maximum  likelihood  (ML)  recursion  appro  h.  In  several 
important  situations  it  is  shown  to  be  closely  related  to  w.he  stochastic 
approximation  approach  of  Robbins  and  Monro  (1951). 

The  problem  can  be  described  as  follows.  The  response  y  is  related  to 
an  underlying  "design"  variable  x.  Denote  the  mean  of  y  at  x  by  M(x), 
which  is  unknown  to  the  experimenter.  Of  special  interest  is  the  solution 
x*  to  the  equation  M(x)  -  p.  In  bioassay  x*  may  be  an  effective  dose 
level t  in  control  system  x*  may  be  an  optimal  input  level.  Usually  the 
distributional  form  of  y  is  roughly  known,  e.g. ,  binomial  in  bioassay. 

Denote  it  by  f (y | 9 ) •  The  parameter  6  is  related  to  x  via  a  link  function 
unknown  to  the  experimenter.  For  univariate  x,  our  approach  starts  with 
assuming  a  parametric  link  function  0  ■  g(Xx  -  a)  with  g  known.  An 

A  A 

efficient  estimate  (X  ,o  )  of  (X,a)  is  obtained  based  on  the  first  n 

n  n 

observations  via  the  assumed  model  f  and  g.  Under  f  and  g,  E(y|x)  is 


*To  appear  in  "Adaptive  Statistical  Procedures  and  Related  Topics"  (J.  Van 
Rysin,  ed.),  1986. 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041  and 
the  A.  P.  Sloan  Foundation. 


a  function  of  a  and  A,  denoted  by  H(x|u»A).  It  is  typically  monotone 
in  x.  The  ML  recursion  chooses  the  next  design  xn+1  to  satisfy 

A  A 

H(xn+^ | an» A^)  =  p.  The  procedure  can  be  repeated  indefinitely.  The  idea  was 
first  studied  in  Wu  (1985)  for  binary  y. 

In  Section  2  this  approach  for  normal  error  and  mear  link  function  is 
shown  to  yield  a  nonadaptive  Robbins-Monro  (RM)  stochastic  approximation  (1) 
if  A  in  the  preceding  description  is  fixed.  For  o  and  A  unknown,  it 
leads  to  a  variation  (11)  of  an  adaptive  RM  scheme.  The  new  scheme  (11)  has 
the  same  first  order  asymptotic  behavior  as  the  adaptive  RM.  But  the  second 
order  behavior,  yet  to  be  investigated,  will  probably  be  different.  The  RM 
scheme,  without  any  knowledge  of  M,  has  desirable  asymptotic  properties 
under  weak  conditions  on  M.  This  robustness  is  therefore  shared  by  the  ML 
recursion,  although  the  latter  is  based  on  the  assumption  of  a  possibly 
incorrect  link  function  g(Ax  -  a).  Its  robustness,  in  the  case  of 
binary  data  with  logit  link,  is  shown  to  stem  from  the  iteration  step 

A  A 

H( x  „|a  ,A  )  =  p  •  See  (16)  and  (17).  Section  3  contains  other  results 
that  link  the  two  approaches.  A  general  description  of  the  ML  recursion 
approach  for  generalized  linear  models  is  given  in  Section  4.  Canonical  link 
functions  are  recommended.  In  the  case  of  Poisson  variation,  the  ML  recursion 
based  on  a  canonical  link  is  equivalent  to  a  version  of  the  RM  recursion 
( 1 ) .  This  connection  enables  us  to  study  the  asymptotic  behavior  of  the  ML 
recursion.  The  paper  concludes  with  the  pros  and  cons  of  the  ML  recursion 
approach  relative  to  the  RM  recursion  approach  and  points  out  potential  gains 
in  relating  the  two  approaches.  No  rigor  is  attempted  throughout  the  paper. 


2.  ROBBINS-MONRO  PROCEDURE  AND  ITS  VARIATIONS  VIA  LEAST  SQUARES  RECURSION 


In  their  pioneering  paper  Robbins  and  Monro  (1951)  proposed  the  following 
recursive  scheme 

t1*  xn+1  =  xn  "  f  ^n  *  *>>  ' 

for  estimating  the  solution  x*  of  M(x*)  *  p,  where  the  observation  yn 

taken  at  xn  satisfies  y  =>  M(x)  +  e  with  E(e)  =  0.  The  scheme  does  not 

assume  any  knowledge  of  M,  which  is  typically  unknown.  Under  weak 

conditions  on  M  and  e,  xn  is  known  to  converge  to  x*  with  probability 

one  as  n  +  »  (Robbins  and  Siegmund,  1971)  and  to  be  asymptotically  normal 

(Sacks ,  1958).  The  optimal  choice  of  c  for  minimizing  the  asymptotic 

variance  of  xn  is  (M'(x*))-1,  M'(x*)  /  0. 

The  Robbins-Monro  (RM)  scheme  (1)  can  be  interpreted  as  a  recursive 

scheme  with  least  squares  updating.  Let  us  make  a  tentative  assumption  that 

2 

y  =>  a  +  6x  +  e  and  the  errors  e  have  mean  zero,  variance  o  and  are 
uncorrelated.  First  we  consider  the  simple  case  of  known  g.  For  solving  the 
linear  equation  a  +  gx  *  0  (p  is  now  zero),  vhe  parameter  of  interest  is 
0  ■  -a/8«  Based  on  the  first  n  observations,  the  least  squares  estimate 
(or  the  maximum  likelihood  estimate  if  the  errors  are  normal)  ot  0  is 


(2) 


where  x  and  y  are  respectively  the  means  of  x-  and  y. ,  i  =  1(1)n. 
n  n  ii 

If  the  next  observation  is  taken  at  the  current  estimate  0^  of  0, 

the  recursive  relation 


obtains.  It  is  easy  to  see  that  (3)  is  equivalent  to 


which  is  the  RM  recursion  ( 1 )  with  c  *  g 


This  equivalence  was  pointed  out 


by  Lai  and  Robbins  (1979).  It  is  a  significant  step  since  it  connects  two 
seemingly  distinct  approaches  to  the  design  problem  outlined  in  Section  1. 

The  approach  that  leads  to  (3)  is  parametric  in  that  it  is  motivated  by  a 
linear  function  that  links  E(y)  and  x  and,  to  a  lesser  extent,  by  the 
normality  of  errors  (which  makes  the  least  squares  estimator  fully 
efficient).  On  the  other  hand,  the  stochastic  approximation  approach  (4)  is 
nonparametric  in  that  its  asymptotic  performance  is  very  much  independent  of 
the  knowledge  of  M(x).  The  assumption  y  =  a  +  gx  +  e  is  useful  for 
motivating  and  generating  design  procedures.  The  validity  and  performance  of 
the  resulting  design  are  nonetheless  independent  of  the  assumption. 

So  far  we  have  assumed  that  the  slope  parameter  8  is  known.  For 
unknown  8,  what  recursive  scheme  will  the  least  squares  updating  approach 
lead  to?  Here 


(5) 

and 


xn+1 


f 


(6) 


-  .2 


-  n  n 
Bn  =  I  yL(\  -  *n>  /  l  <*i  -  V 


is  the  regression  slope  estimate.  From  (5)  and  (7) 


(7) 


x  =  x  +  (x  -  x  )/(n  -  1 )  =  x  ,+(x  -X  )/n  , 
n  n- 1  n  n  n- 1  n  n- 1 


we  have  the  following  recursive  relation 


‘n+1 


"  X"  =  X-  "  x_,  -  (yn/8n  - 


n  n  n-1 


(8) 


=  (x  -  x  )/n  +  y  /&  -  y/g 

n  n- 1  n- 1  n- 1  n  n 

U- 


=  (i  -  r)y„  ./6„  .  -  yn/8„ 

v  n '  n- 1  n- 1  n  n 


The  last  equality  follows  from  (5)  with  n  replaced  by  n-1.  By  using  (7) 


for  y,  this  equals 


(9) 


It  remains  to  derive  an  explicit  form  for  the  first  term  of  (9).  From  (7.2.7) 
of  Goodwin  and  Payne  (1977), 


_n-1 


8  -  0  « 
n  n- 1 


(x  -  x  ) (y  -  y  -  g  (x  -  x  )) 
l  n  i  n  n- 1  n-in _ n- 1 

.  . .n-1 ,  -  ,2  ^  _n-1 .  .2 

(n  -  1 ) Z 1  (xi  -  x  ^)  +  E1  (xA  -  x^) 


(n  -  1){x  -  x  )y 

_ n _ n-  i  n 

nl"(x  -  x  )2 
ii  n 


1  4  * 

n-1 


The  second  equality  follows  from  (5).  Therefore 


1  \-  /I*  1\  n-1- 


0  -  *  J  ) 

B  •«  B 
n-1  n 


Vn- 


n-1 


n-1 


0  .0 

n- 1  n 


(10) 


y„  y„_4  (n-i)  (x  -  x  ) 
n  n- 1  n  n- 1 

e„-i  n£i(8i  -  :n>2 

y„  <-  -  ”2<?„-A.1>2 

:  n.  -  .2 

nsn  ns,(x1  -  xn) 

The  last  equality  follows  from  (5).  From  (8)- (10)  follows  a  new  recursive 
scheme 


(ID 


n+1 


«-1  2  -  *  2 

8  (n-1)  (y  ./B  . ) 

n  r .  .  n—  i  n— i  ^ 

x - F 1  +  - - - y 

11  11  1  .rn,„  -  Z  \2  1  T 

S1(  i  n 


If  the  second  term  inside  the  square  bracket  is  ignored,  (11)  reduces  to 

*-1 

the  RM  recursion  (1)  with  c  “  Bn  and  p  =  0.  Such  an  adaptive  procedure 

A 

with  proper  truncation  on  was  shown  (Lai  and  Robbins,  1981;  Anbar,  1978) 

to  have  the  same  minimal  asymptotic  variance  as  the  optimal  choice 
c  -  (M'(x*))-1. 


-5- 


We  now  study  the  order  of  magnitude  of  the  "correction  term" 


(12) 


n  -  .2 

nl.(x.  -  x  ) 
ii  n 


which,  being  positive,  makes  the  adjustment  I xn4»i  “  xnl  in  (11)  bigger  than 

that  in  the  adaptive  RM  scheme  mentioned  above.  One  can  argue  heuristically 

from  the  results  of  Lai  and  Robbins  (1981)  that  (n  -  1 )  ( yn_ -j ) ^  =  °p(1), 
a  n  “  2 

6  M'(x*),  and  Y,(x.  -  x  )  =  O_(log  n).  The  correction  term  (12)  is 

therefore  of  the  order  Op((log  n)-^).  We  conjecture  that  the  scheme  (11) 

with  proper  truncation  on  the  coefficient  of  n  Vn  has  the  same  limiting 

distribution  as  the  optimal  RM  scheme  (1)  with  c  =  (M'(x*))  1  and  the 

*_1 

adaptive  RM  scheme  ( 1 )  with  c  =  is  relate<*  to  xi  by  the 

simple  linear  regression  model  y^  =  a  +  gx^  +  e^»  this  was  established  in 
Lai  and  Robbins  (1982)  with  a  different  truncation  scheme.  What  the 
correction  term  (12)  does  to  (11)  is  in  the  lower  order  terms.  Expand  the 
mean  square  error  of  xn  as 

a.  a„ 

2  1  2 

(13)  E(x  -  x*)  =  —  +  — —  +  lower  order  terms  , 

n  n  nb 

n 

where  bn  •+■  «°  as  n  -*•  °°.  The  a^  term  is  the  same  for  the  three 

procedures,  while  the  a2  term  may  differ.  Since  the  scheme  (11)  is  based  on 
the  least  squares  estimator,  it  may  be  second-order  optimal  (in  an  appropriate 
sense)  for  nearly  linear  M(x)  and  normal  errors.  Second  order  asymptotic 
results,  currently  unavailable  in  the  literature,  may  provide  further  insights 
into  those  small  or  moderate  sample  phenomena  not  readily  explainable  by 
first-order  theory.  Such  results  are  found  in  section  6  of  Wu  (1985).  Of 
course,  small  sample  behavior  depends  on  the  location  of  initial  observations. 


The  correction  term  (12)  is  non-negligible  only  for  small  or  moderate 

n  when  log  n  is  not  large,  |y^|  >>  0,  or  x^,...,xn  are  not  wide-spread 

To  make  the  scheme  (11)  more  robust  against  poor  choice  of  the  starting 

value  Xq  and  the  motivating  linearity  and  normality  assumptions  on  M  and 

e,  the  correction  term  (12)  should  be  made  less  dependent  on  the  remote 

-  -  2 

past.  Such  can  be  achieved  by  replacing  y  g  and  £(x.  -  x  )  in 

n—  l  n—  I  in 

(12)  by  weighted  versions  with  more  weights  on  recent  observations. 


3.  MAXIMUM  LIKELIHOOD  RECURSION  IN  BINARY  EXPERIMENTS 


In  a  binary  experiment  the  outcome  y  is  denoted  by  1  (response)  or  0 
(nonresponse).  The  probability  of  response  is  related  to  a  stress  level  x 
(at  which  the  experimentation  is  performed)  by 

M(x)  =  Prob{y  =  1 | x}  =  E(y|x)  . 

It  is  often  of  interest  to  estimate  the  percentile  x*  of  M(x),  i.e. 

M(x*)  =  p,  0  <  p  <  1.  The  problem  is  that  in  practice  the  form  of  M  is 
often  unknown.  For  expensive  runs,  sequential  experimentation,  if  feasible  in 
practice,  is  called  for  since  the  data  can  be  collected  and  used  in  a  most 
effective  way.  For  related  comments,  see  Wu  (1985). 

The  maximum  likelihood  (ML)  recursion  approach  starts  with  a  parametric 
model  for  the  unknown  M(x).  First  we  consider  a  one-parameter  model 
H(x  -  a)  with  parameter  a  and  H  known.  For  estimating  the  100pth 
percentile,  H  is  chosen  to  satisfy  H(0)  =  p.  That  is,  if  H  is  the  true 
model,  a  is  the  100pth  percentile  of  M.  The  log  likelihood  for  the  first 
n  observations  is 


l  Yi  log  H<xi  -  a)  +  l  (1  -  y^logd  -  H(xi  -  a))  • 

1  1 

The  maximum  likelihood  estimate  a  of  a  satisfies  the  equation 

n 


H '  ( x .  -  a) 


n  H' (x.  -  a) 


\  Yi  H(x.  -  a) ( 1  -  H(x.  -  a))  ^  1  -  H(Xi  -  a)  * 


By  writing 


K{  x )  = 


H'  (x) 

H{ x) ( 1  -  H( x) )  ' 


the  above  equation  can  be  expressed  as  a  weighted  normal  equation 


T  y.K(x.  -  a  )  =  y  H( x .  -  a  )K(x.  -  a  ) 
~  1  1  n  ^  i  n  l  n 


, ,*•  .  ."V  "«  '•  /  '.*• 

,  V- V-\- --V- V- *. •- * 


ry  y\. 

•  »'• .  ./■ . "*•  .*■  .*■  .*■ 
-rfVO  -.W  Oo  • 


According  to  the  ML  recursion  approach,  the  next  design  xn+1  is  chosen 

to  be  the  current  estimate  S  of  a  and  the  preceding  equation  becomes 

n 


(14) 


n  n 

l  yiK(x1  -  xn+1)  -  l  H(xi  -  xn+1)K(Xi  -  xn+1) 


To  obtain  a  recursive  relation  between  xn+1  and  xn,  (14)  gives 
n 

l  -  xn+.j  )K(  x^  -  xn+1)  -  -  x^K^  -  xR)]  +  pK(0) 


I  y1tK(xi  -  xn+1)  -  K(Xi  -  xn))  +  ynK(0) 
1 


4  I  yiK*(xi  -  xn)(xn  -  Xn+1)  +  ynK(0)  . 

Unless  K'  «=  0,  xn+1  -  xR  depends  on  all  the  past  x^  and  y^  Recall  that 
the  RM  recursion  depends  on  the  past  {y^}  through  yR.  Only  when  K’  =  0, 
xn+1  -  xR  behaves  more  like  the  RM  scheme  in  this  regard.  Note  that 
K'  *  0  iff  H  is  of  the  logistic  form  (1  +  (^  -  1)e  cx)  \  Another 
advantage  of  the  logit-based  ML  recursion  design  is  that  it  is  less 
susceptible  to  poor  choice  of  initial  observations  than  the  probit-based 
procedure  (Sellke,  1986). 

Without  loss  of  generality,  assume  c  «  1.  For  the  logit  assumption, 
equation  (14)  takes  the  form 

n  n 

r  i  +  3 

vp 


n  n  , 

lh-U'  ♦£-’) 


1  1 

which  yields  the  recursive  relation 


-(x. -x  )  -1 
l  n+1 

e  ) 


yn  =  I 


-(x.-x  )  -( X-X  ) 

in  i  n+1  •> 

JdS _ =_® - i 


,  -(x.-x  ) 

J  i  n  w.  ,  i.  n+ 1  \ 
(1  +  de  J(1+de  J 


,  .  +  p,  d  =  —  -  1  , 

-<x,-x_.,)  *  p 


or  equivalently 


#w;l 


m 

t 

m 


M 


n  „  n  i n+ 1  n  •> 

(15)  l  - — - 0  ~  -g - ) - =y  -  p. 

;  x  -x.  x  -x.  x  _-x  n 

( 1  +  de  J ( 1  +  de  e  ) 

The  special  case  p  =  —  was  given  in  (13)  of  Wu  (1985).  The  recursion  (15) 

defines  xn+-j  -  xn  implicitly  as  a  nonlinear  function  of  yn  -  p  and 
,  ,  n 

{x^}^.  It  is  qualitatively  similar  to  the  RM  scheme  in  two  regards.  It 
pushes  xR  in  the  "right"  direction,  i.e.,  xn+i  “  xn  has  the  same  sign  as 
p  -  yn»  The  step  size  lxn+i  ~  xnj  gets  smaller  as  n  increases  (and 
eventually  at  the  rate  n~^). 

The  ML  recursion  approach  can  also  be  applied  to  the  two-parameter 

logistic  model.  The  xn+i  and  xn  obtained  in  this  manner  cannot  be  related 

in  an  exact  manner  like  (15).  By  using  linear  approximations,  it  was  shown 

(Wu,  1985)  that  xn+i  “  xn  can  be  approximated  by  the  adaptive  RM  scheme  (1) 
*-1 

with  c  =  6  •  On  the  other  hand,  a  one-parameter  model  does  not  lead  to  a 

n 

recursive  scheme  asymptotically  equivalent  to  the  adaptive  RM  scheme.  This 
point  should  be  clear  from  the  discussion  in  Section  2,  (3)  and  (4).  For 
binary  data,  the  recursion  (15)  based  on  the  one-parameter  logistic  model 
again  can  not  be  approximated  by  the  adaptive  RM.  This  is  because  the  slope 
parameter  in  the  two-parameter  model  plays  the  role  of  the  regression  slope 
parameter  0  in  the  adaptive  RM.  Without  a  consistent  estimate  of  the  slope 
parameter,  first-order  optimality  of  the  ML  recursion  in  terms  of  minimizing 
a1  in  (13)  cannot  be  achieved. 

The  logistic  model  has  a  unique  place  for  binomial  data  in  that  its 
likelihood  equation  resembles  the  normal  equation  in  linear  models.  For 
generalized  linear  models  to  be  discussed  in  Section  4,  this  unique  role  is 
played  by  the  canonical  link  function. 

Since  the  RM  recursion  has  desirable  asymptotic  properties  under  weak 
conditions  on  M  and  the  ML  recursion  can  be  approximated  by  a  version  of 


the  RM  scheme,  this  nonparametric  property  may  be  shared  by  the  ML 
recursion.  In  the  following  we  will  use  the  logistic  model  to  explain  why  the 
ML  recursion  approach,  apparently  model-based,  is  robust  against  model 
misspecif ication.  The  following  argument  is  taken  from  Wu  (1985).  The 
likelihood  equation  for  a  is 
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Make  a  rather  strong  assumption  that  a  a*,  £  X*  uniformly  so  that 

n  n 

x_.i#  which  satisfies  H(x  ,£  )  ■  p,  converges  to  a  constant  w.  The 

nT »  n+i  n  n 

ML  recursion,  based  on  the  H  function,  is  robust  if  w  satisfies 

M(w)  «  p.  Recall  that  M  is  the  unknown  true  response  function  which  may  be 

different  from  H.  For  continuous  M,  the  left  side  of  (16)  converges  to 

M(w)  a.s.,  since  each  y^  is  binomial  with  probability  M(x^).  This  side 

does  not  depend  on  the  H  assumption.  The  right  side  of  (16)  converges  to 

H(w|a*,A*)  =  lim  H(x  ^Ja  ,£  )  =  p.  Therefore  M(w)  -  p.  The  right  side, 

n+ 1  n  n 

though  starting  with  the  H  assumption,  turns  out  to  be  equal  to  the 
constant  p  because  of  the  recursion  step 

<i7>  "'v.ivV  -» 

which  is  recognized  as  the  source  of  robustness  of  the  ML  recursion 
approach.  In  other  words,  a  possible  misspecif ication  in  H  is  "undone"  by 
the  recursion  step  (17).  This  robustness  claim  may  not  hold  for  the 
estimation  of  other  parameters  such  as  the  slope  of  M  at  x*. 

Another  interpretation  is  that  the  assumed  model  H  is  locally  valid 


in  x,  whatever  the  true  "global"  model  is 


4.  EXTENSIONS  TO  GENERALIZED  LINEAR  MODELS 


The  ML  recursion  approach  to  sequential  design  can  be  applied  to  very 
general  variations  described  by  a  generalized  linear  model.  The  response  y 
has  a  density  function 

exp{[y0  -  b(8)]/a(<p)  +  c(y,<p)} 

for  some  functions  a,b,c.  If  <p  is  known,  this  is  an  exponential  family 
with  canonical  parameter  0.  The  response  y  is  related  to  the  variable  x 
through  a  link  function  :  0  n(0)  such  that  the  n  scale  is  linear  in 

x,  i.e.,  n  =  Xx  -  a-  Typically  the  link  function  is  unknown.  The  mean 
response  M(x)  =  E(y|x)  is 

M(x)  =  b* (0 )  =  b* (n_1(Xx  -  o))  • 

Without  knowing  M,  we  assume  a  link  function  £:0  £(0)  so  that 


5  =  Xx  -  a  and  the  mean  response  function  is 

H(x| a, X)  =  b'(0)  =  b'(£_1(Xx  -  a))  , 

where  £  is  the  inverse  function  of  £•  The  likelihood  equation  obtained 
by  differentiating,  with  respect  to  a  and 

l  1(Xxi  -  a)  =  l  b(£  1(Xxi  -  a))  , 
is,  by  writing  £  *  =  p. 


(18) 


l  YiP,(Xxi  -  a)  *  l  H(x^ | a, X ) p ' ( Xx^  -  a) 
l  xiyip'(Xxi  -  a)  =  l  xiH(xi|a»X)p'(Xxi  -  a)  . 


If  £(0)  =  0,  £  y^  and  £  y^x^  are  the  sufficient  statistics  for  a  and 
X  and  (18)  resembles  the  normal  equation  in  linear  models.  Such  a  link 


function  is  called  a  canonical  link  (McCullagh  and  Nelder,  1983).  For 
2 

N(p,a  )#  £(u)  =  y  is  the  canonical  link;  for  binomial  with  probability 

p,  the  logit  function  £(p)  =  log[p(1  -  p)”1]  is  the  canonical  link  and 

■"Xx+ci  *1 

£(p)  *  Xx  -  a  gives  the  logistic  function  p  =  (1  +  e  a)  .  Without 


a  priori  reason  for  choosing  other  H  functions,  the  canonical  link  function 

is  a  convenient  choice  for  the  ML  recursion  approach. 

For  estimating  x*  with  M(x*)  ■  p,  where  p  is  in  the  range  of  the 

mean  b'(0),  the  ML  recursion  works  as  follows.  Let  a  and  £  be  a 

n  n 

solution  to  (18)  based  on  the  first  n  observations.  Take  the  next 


>bservation  yn+1  at  xn+1  satisfying 


Hr v  I « 


In  the  next  section,  we  shall  study  another  special  case,  the  Poisson 


distribution 


5.  SEQUENTIAL  POISSON  EXPERIMENTS 


Examples  of  Poisson  variation  include  radiation  counts  and  number  of  jobs 
arriving  in  a  period.  The  associated  x  variable  may  be  the  distance  to  the 
source  of  radiation  or  the  parameter  specification  of  a  queuing  system. 

Here  y  is  a  Poisson  variable  with  mean  y.  To  estimate  x*  with 

M( x* )  *  E(y|x*)  =  p,  we  assume  a  canonical  link  with  one  parameter,  i.e., 

(19)  p(y|y)  *  exp(y  In  y  -  y),  In  y  =  x  -  a  . 

The  mean  response  function  according  to  (19)  is  H(x|a)  =  eX  a.  The  parameter 

SS,  j  X  QJ 

of  interest  is  x  satisfying  H(x|a)  =  p»  i.e.  e  =  pe  .  By  solving  the 
equation 


x.-a 


h  l  tyi(xi  -  a)  -  e  1  ]  =  0  ' 


we  obtain  the  maximum  likelihood  estimate  <5  of  a,  through 

n 

-a  n  n  x. 

•  "  -  l  yL  /  l  *  1  • 

If  the  next  design  xn+1  is  chosen  to  satisfy  H^xn+il®n^  =  P' 


n+1 


o_  n  x.  n 


(20)  e  =  pe  n  =  p  J  e  1  /  [  y,  . 

Equation  (20)  for  n  and  n  -  1  gives 


Xn  n-1  n-1  Xi 

Xn+1  Xn  6  1  yi  “  V  6 

2  -  e  =  p  - z - 

i  yLz  yL 


p  - 


=  p 


or  equivalently, 

(21) 


X  X  c 

n+1  n  n  .  , 

e  =  e - (y  -  p) , 

n  n 


n-1  n 
e  Ey. 


-1  *<Vl. 


(«  l  yiKp"'*  -) 


Equation  (21)  is  an  RM  recursion  in  the  transformed  scale  e. 

By  rewriting  the  ML  recursion  as  an  RM  recursion,  we  can  draw  on  the  rich 


literature  on  the  asymptotic  results  of  the  latter  procedure.  The  remaining 


section  is  devoted  to  a  heuristic  study  of  the  limiting  behavior  of  (21) 


Assume  that  xn  converges  to  a  constant  zQ.  From  the  martingale  strong  law 
of  large  numbers, 

n  *  n 


(22) 


—  y  y,  *  —  I  M(x  )  ■*>  M(*  ) 
n^  i  n4,  t  0 


Similarly#  by  letting  n  *  on  both  sides  of  (20)# 

'o  .*» 

•  '  p  »<V  ' 

implying  that  M(zQ)  -  p  and  zQ  -  x*,  that  is  xn  is  asymptotically 

x 

consistent.  The  asymptotic  variance  of  e  to  e  depends  on  the  limiting 
value  of  cn  in  (21).  From  (20)  and  (22), 


c  1  >  M(z  )e  ** 
n  0 


-x* 

pe  , 


which  is  not  equal  to  M'(x*)  and  therefore  the  scheme  (21)  does  not  have 
minimal  asymptotic  variance.  A  rigorous  treatment  of  the  consistency  and 


asymptotic  normality  of  e 
(1985)  may  be  applicable. 


in  (21)  is  desired.  A  general  result  of  Sellke 
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6.  CONCLUDING  REMARKS 

The  ML  recursion  approach  to  sequential  designs  is  based  on  ideas  well 
known  to  statisticians .  It  is  intuitively  appealing  and  easy  to  understand, 
although  the  stochastic  approximation  is  slightly  easier  to  implement*  It  is 
applicable  to  very  general  distributions.  Special  features  such  as 
discreteness  or  boundedness  of  the  data  are  taken  into  account  through  a 
proper  choice  of  the  likelihood.  In  several  important  situations,  it  is  very 
close  to  the  RM  recursive  scheme  (with  varying  choice  of  the  constant  c)  and 
thus  shares  the  latter's  robustness  against  misspecified  link  function.  If 
the  assumed  model  is  correct,  it  is  asymptoticaly  efficient  and  may  also 
perform  well  in  small  samples  as  the  maximum  likelihood  estimator  often 
does.  Wu's  (1985)  simulation  results  for  binary  data  suggest  that  it  may  be 
superior  to  the  RM  recursion  in  small  samples. 

Its  major  problem  thus  far  is  the  lack  of  rigorous  theory  on  its 
asymptotic  behavior.  An  attempt  has  been  made  by  Sellke  (1985).  Here  its 
linkage  to  the  RM  scheme  pays  off.  By  rewriting  it  as  an  RM-like  scheme  and 
suitably  bounding  the  constant  c  in  (1),  simple  proofs  of  its  asymptotic 
properties  may  be  obtained  by  drawing  on  the  vast  literature  on  the  RM 
scheme.  Another  gain  due  to  this  linkage  is  in  the  choice  of  stabilizing 
constant.  Adaptive  versions  of  the  ML  or  RM  recursion,  though  asymptotically 
optimal,  may  not  perform  well  in  small  samples  because  of  the  instability 
caused  by  adaptation.  Stability  can  be  achieved  by  putting  a  bound  on  the 
constant  c  in  (1).  The  simulation  results  of  Wu  (1985)  demonstrate  the 
effectiveness  of  this  device  in  reducing  the  small-sample  mean  square  errors 
of  both  procedures.  Cross-fertilization  of  the  two  approaches  may  lead  to 
further  understanding  and  results. 
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